4 inputs:
3 time series outputs (each with 512 timepoints):
Observations were produced by running the model at some \(\textbf{x}^*\), and adding noise sampled from a Matérn 3/2 kernel, with priors on the variance and correlation length:
There was also uncorrelated Normal iid noise added, with variance \(\sigma^2 = 10^{-6}\).
Sampled a 100-member LHC (grey), comparing to observations (red) and true ‘best’ model run (green - NOT KNOWN UNTIL THE END):
Alternatively, plotting residuals:
Plot basis and coefficients for e.g., flow 1:
## Input value Output coeff
## 1 kMV -0.4105579 c1 -57.654972
## 2 kMV 0.9304734 c1 -20.741136
## 3 kMV 0.2210836 c1 47.747129
## 4 kMV -0.6875105 c1 -51.121585
## 5 kMV 0.2702968 c1 21.977666
## 6 kMV -0.2430006 c1 -5.173414
Fitted emulators separately for flow1, flow2 and pressure instead of combining, just to give a bit more precision for each, and to make it easier to handle the error (all have a different sample from the covariance function, uncorrelated so may have different magnitude). May not make any difference, perhaps go back and compare later.
However, this choice allows to easily identify e.g. best runs for each, which may be useful, rather than the best runs being dominated by flow/pressure with highest magnitude/variability.
In all cases, emulation fairly accurate, and for each emulated vector, variance low relative to the spread of coefficients. May have been possible to emulate additional vectors.
Pressure was the least variable, and only 2 vectors were required to explain \(>99%\). Added a 3rd anyway:
For each of the 3 outputs, calculated the multivariate implausibility.
For the error/discrepancy matrix, combined these into a single variance matrix (often called this \(\textbf{W} = \boldsymbol{\Sigma}_e + \boldsymbol{\Sigma}_{\eta}\)). The issue here is that this matrix needs to be fixed. If we fix the correlation length \(\ell\), i.e. so that the correlation structure of \(\boldsymbol{\Sigma}_{\eta}\) is fixed, then changes in \(\sigma^2_m\) simply scale this matrix and hence \(\textbf{W}\) (essentially ignoring the effect of \(\boldsymbol{\Sigma}_e\), as \(10^{-6} << \sigma^2_m\) in general).
This means that, although we don’t know \(\sigma^2_m\), we can obtain a lower bound on the implausibility of \(\textbf{x}\) (for a fixed \(\ell\)) by setting \(\sigma^2_m = 22.5\), i.e. at the maximum value from its prior. Any decrease in \(\sigma^2_m\) will increase the \(\mathcal{I}(\textbf{x})\) (again, with everything else still fixed), and hence if we rule out \(\textbf{x}\) with \(\sigma^2_m = 22.5\), then we would also rule it out for any lower values, and hence \(\textbf{x}\) is not possible across the full support of \(\sigma^2_m\).
As an initial exploration, let’s allow a fixed set of choices for \(\ell\), namely in the set \(\ell = \{ 0, 0.05, 0.1, \ldots, 0.85 \}\), covering the prior range. We therefore just need to construct 18 versions of \(\textbf{W}\), and then calculate the implausibility for each (for flow1, flow2 and pressure).
Given this, what percentage of parameter space is NROY?
## l Flow1 Flow2 Pressure All
## 1 1.0e-10 0.91198 0.10429 0.23242 0.07443
## 2 5.0e-02 1.00000 1.00000 0.96080 0.96080
## 3 1.0e-01 1.00000 1.00000 0.98982 0.98982
## 4 1.5e-01 0.97167 0.52023 0.76542 0.49801
## 5 2.0e-01 0.00000 0.00000 0.00000 0.00000
## 6 2.5e-01 0.00000 0.00000 0.00000 0.00000
## 7 3.0e-01 0.00000 0.00000 0.00000 0.00000
## 8 3.5e-01 0.00000 0.00000 0.00000 0.00000
## 9 4.0e-01 0.00000 0.00000 0.00000 0.00000
## 10 4.5e-01 0.00000 0.00000 0.00000 0.00000
## 11 5.0e-01 0.00000 0.00000 0.00000 0.00000
## 12 5.5e-01 0.00000 0.00000 0.00000 0.00000
## 13 6.0e-01 0.00000 0.00000 0.00000 0.00000
## 14 6.5e-01 0.00000 0.00000 0.00000 0.00000
## 15 7.0e-01 0.00000 0.00000 0.00000 0.00000
## 16 7.5e-01 0.00000 0.00000 0.00000 0.00000
## 17 8.0e-01 0.00000 0.00000 0.00000 0.00000
## 18 8.5e-01 0.00000 0.00000 0.00000 0.00000
This seems like fairly strong evidence that \(\ell < 0.2\). If we increased the variance, we would get non-empty NROY spaces for higher correlations, however this is not allowed by the prior. It’s likely that the true variance is lower than 22.5, hence if we knew this these spaces would be smaller.
Now we have an issue - which NROY space is correct? Any of these \(\ell\)s could be true, and we can’t arbitrarily reduce the variance to reduce the NROY space (22.5 might be correct). If the true answer is \(\ell = 0.1\) and \(\sigma^2_m = 22.5\), all (or almost all) of parameter space is in NROY. This suggests that the variance is probably too high, but doesn’t seem reasonable to discount - certainly can be considered not implausible.
We can’t necessarily rule anything out, but let’s target the new wave on the better parts of space, so that we can then improve emulation in this region. Hence defined ‘NROY’ as any parameter setting that is in the lowest 5% of implausibilities, for any \(\ell \leq 0.2\) (adding a bit of tolerance, in case \(\ell = 0.2\) is all ruled out for e.g. poor emulation), for either flow1/flow2/pressure.
If all of these ‘best 5%s’ were unique, then ‘NROY’ would consist of 75% of space. In fact, it is 13.113% - unsurprisingly there’s overlap.
What does this space look like?
For wave 2, sampled 100 runs from the set of best (better) runs, with this design chosen by aiming to maximise the minimum distance between points. Plotting these 100 new runs (in turqouise, over red wave 1):
We seem to have been reasonably successful in identifying runs that are more similar to the observations, whilst hopefully not over-constraining - there’s still some variety in patterns.
At wave 1, we didn’t have residuals that looked like the true residual (green, unknown), but at wave 2, we’ve successfully identified residuals similar to this:
Successfully zoomed in further - new runs at wave 3 are a subset of the spread at wave 2 (and it turns out, relatively consistent with the true unknown model run, green line):
Residuals at this wave still contain the ‘true’ residual (green line) - which was unknown, hence method seems to be doing quite well at finding this!
Fitted emulator using the 300 runs across the 3 waves.
Plotting the 500 runs submitted, along with the values of \(\sigma^2, \ell\):
## kMV alpha lrrA lrrV
## Min. :209305 Min. :0.8747 Min. :20.46 Min. :21.59
## 1st Qu.:240866 1st Qu.:0.8821 1st Qu.:29.04 1st Qu.:27.24
## Median :256623 Median :0.8839 Median :31.26 Median :28.31
## Mean :257681 Mean :0.8838 Mean :31.44 Mean :28.68
## 3rd Qu.:271285 3rd Qu.:0.8857 3rd Qu.:33.69 3rd Qu.:30.20
## Max. :298566 Max. :0.8894 Max. :39.61 Max. :36.71
## l var Type
## Min. :0.0900 Min. : 4.681 Length:500
## 1st Qu.:0.1100 1st Qu.: 8.314 Class :character
## Median :0.1300 Median :13.063 Mode :character
## Mean :0.1246 Mean :12.393
## 3rd Qu.:0.1300 3rd Qu.:14.344
## Max. :0.1500 Max. :22.142
## kMV alpha lrrA lrrV l var
## 2.5% 223697.9 0.8780456 25.11774 24.29051 0.10 6.363513
## 97.5% 294889.3 0.8880182 38.45852 33.60855 0.15 20.608741
I.e. what results get if use the known error (\(\ell = 0.1, \sigma^2 = 5.8\)).
Bound, and summary of implausibilities for flow1/flow2/pressure in turn:
## [1] 598.1784
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 512.5 532.7 548.8 551.1 567.2 644.4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 533.5 684.1 770.8 787.5 877.8 1344.8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 591.3 624.9 727.9 898.4 1034.1 3381.6
Proportion of parameter space not ruled out for flow1/flow2/pressure:
## [1] 0.98242 0.08657 0.06170
Proportion of runs that are in NROY for all 3 outputs:
## [1] 0.04457
Considering true impl across the waves, i.e. did we actually improve despite unknown error?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 519.3 558.5 585.5 615.1 644.4 888.6
Plotting submitted 500 runs (bright green), NROY using true error parameters (blue):
## kMV alpha lrrA lrrV
## 5818 238755.3 0.8824054 29.01729 28.24244
## kMV alpha lrrA lrrV l var
## 1 250000 0.885 35 25 0.1 5.8
This is not a perfect comparison, as NROY here includes any runs that are within tolerance - no weighting by ‘closeness’. The 500 runs were sampled via rejection sampling based on their likelihood, hence weighted (and so runs towards edges of NROY, closer to the bound and hence less likely, not captured by these 500 runs).
If we consider reducing the bound, do we get closer to truth? i.e. BC might work well?
## [1] 0.00093
Changing bound so go from 4.5% in NROY to only 0.1% gives posteriors that look great for lrrA, lrrV, but are off for alpha and kMV (which looked good before!)
Looking at best few runs vs truth: generally too high for kMV/alpha, decent for others:
## kMV alpha lrrA lrrV
## 14015 293285.2 0.8876411 38.49479 25.15424
## 23487 253490.2 0.8854755 32.21651 27.74173
## 33995 285087.4 0.8872972 37.02594 26.43998
## 48932 282562.9 0.8882096 36.49534 24.97335
## 49529 289042.6 0.8866530 39.36475 23.63969
## 84897 295526.4 0.8878309 37.97566 25.15701
## 88814 295582.3 0.8885426 36.88157 25.62080
## 98611 282052.3 0.8883698 36.92142 25.45046
## kMV alpha lrrA lrrV l var
## 1 250000 0.885 35 25 0.1 5.8
Possibly suggesting that issue is NOT calibration method, but fact emulator is not yet ‘perfect’ around truth.